54 research outputs found
Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures
In this work we build upon negative results from an attempt at language
modeling with predicted semantic structure, in order to establish empirical
lower bounds on what could have made the attempt successful. More specifically,
we design a concise binary vector representation of semantic structure at the
lexical level and evaluate in-depth how good an incremental tagger needs to be
in order to achieve better-than-baseline performance with an end-to-end
semantic-bootstrapping language model. We envision such a system as consisting
of a (pretrained) sequential-neural component and a hierarchical-symbolic
component working together to generate text with low surprisal and high
linguistic interpretability. We find that (a) dimensionality of the semantic
vector representation can be dramatically reduced without losing its main
advantages and (b) lower bounds on prediction quality cannot be established via
a single score alone, but need to take the distributions of signal and noise
into account.Comment: To appear at *SEM 2023, Toront
Is Structure Necessary for Modeling Argument Expectations in Distributional Semantics?
Despite the number of NLP studies dedicated to thematic fit estimation,
little attention has been paid to the related task of composing and updating
verb argument expectations. The few exceptions have mostly modeled this
phenomenon with structured distributional models, implicitly assuming a
similarly structured representation of events. Recent experimental evidence,
however, suggests that human processing system could also exploit an
unstructured "bag-of-arguments" type of event representation to predict
upcoming input. In this paper, we re-implement a traditional structured model
and adapt it to compare the different hypotheses concerning the degree of
structure in our event knowledge, evaluating their relative performance in the
task of the argument expectations update.Comment: conference paper, IWC
Measuring Thematic Fit with Distributional Feature Overlap
In this paper, we introduce a new distributional method for modeling
predicate-argument thematic fit judgments. We use a syntax-based DSM to build a
prototypical representation of verb-specific roles: for every verb, we extract
the most salient second order contexts for each of its roles (i.e. the most
salient dimensions of typical role fillers), and then we compute thematic fit
as a weighted overlap between the top features of candidate fillers and role
prototypes. Our experiments show that our method consistently outperforms a
baseline re-implementing a state-of-the-art system, and achieves better or
comparable results to those reported in the literature for the other
unsupervised systems. Moreover, it provides an explicit representation of the
features characterizing verb-specific semantic roles.Comment: 9 pages, 2 figures, 5 tables, EMNLP, 2017, thematic fit, selectional
preference, semantic role, DSMs, Distributional Semantic Models, Vector Space
Models, VSMs, cosine, APSyn, similarity, prototyp
Recommended from our members
Logical Metonymy in a Distributional Model of Sentence Comprehension
International audienceIn theoretical linguistics, logical metonymy is defined as the combination of an event-subcategorizing verb with an entity-denoting direct object (e.g., The author began the book), so that the interpretation of the VP requires the retrieval of a covert event (e.g., writing). Psycholinguistic studies have revealed extra processing costs for logical metonymy, a phenomenon generally explained with the introduction of new semantic structure. In this paper, we present a general distributional model for sentence comprehension inspired by the Memory, Unification and Control model by Hagoort (2013, 2016). We show that our distributional framework can account for the extra processing costs of logical metonymy and can identify the covert event in a classification task
Compositionality as an Analogical Process: Introducing ANNE
Usage-based constructionist approaches consider language a structured inventory of constructions, form-meaning pairings of different schematicity and complexity, and claim that the more a linguistic pattern is encountered, the more it becomes accessible to speakers. However, when an expression is unavailable, what processes underlie the interpretation? While traditional answers rely on the principle of compositionality, for which the meaning is built word-by-word and incrementally, usage-based theories argue that novel utterances are created based on previously experienced ones through analogy, mapping an existing structural pattern onto a novel instance. Starting from this theoretical perspective, we propose here a computational implementation of these assumptions. As the principle of compositionality has been used to generate distributional representations of phrases, we propose a neural network simulating the construction of phrasal embedding as an analogical process. Our framework, inspired by word2vec and computer vision techniques, was evaluated on tasks of generalization from existing vectors
Extensive Evaluation of Transformer-based Architectures for Adverse Drug Events Extraction
Adverse Event (ADE) extraction is one of the core tasks in digital
pharmacovigilance, especially when applied to informal texts. This task has
been addressed by the Natural Language Processing community using large
pre-trained language models, such as BERT. Despite the great number of
Transformer-based architectures used in the literature, it is unclear which of
them has better performances and why. Therefore, in this paper we perform an
extensive evaluation and analysis of 19 Transformer-based models for ADE
extraction on informal texts. We compare the performance of all the considered
models on two datasets with increasing levels of informality (forums posts and
tweets). We also combine the purely Transformer-based models with two
commonly-used additional processing layers (CRF and LSTM), and analyze their
effect on the models performance. Furthermore, we use a well-established
feature importance technique (SHAP) to correlate the performance of the models
with a set of features that describe them: model category (AutoEncoding,
AutoRegressive, Text-to-Text), pretraining domain, training from scratch, and
model size in number of parameters. At the end of our analyses, we identify a
list of take-home messages that can be derived from the experimental data
AILAB-Udine@SMM4H 22: Limits of Transformers and BERT Ensembles
This paper describes the models developed by the AILAB-Udine team for the
SMM4H 22 Shared Task. We explored the limits of Transformer based models on
text classification, entity extraction and entity normalization, tackling Tasks
1, 2, 5, 6 and 10. The main take-aways we got from participating in different
tasks are: the overwhelming positive effects of combining different
architectures when using ensemble learning, and the great potential of
generative models for term normalization.Comment: Shared Task, SMM4H, Transformer
Representing Verbs with Rich Contexts: an Evaluation on Verb Similarity
Several studies on sentence processing sug-
gest that the mental lexicon keeps track of the
mutual expectations between words. Current
DSMs, however, represent context words as
separate features, thereby loosing important
information for word expectations, such as
word interrelations. In this paper, we present
a DSM that addresses this issue by defining
verb contexts as joint syntactic dependencies.
We test our representation in a verb similarity
task on two datasets, showing that joint con-
texts achieve performances comparable to sin-
gle dependencies or even better. Moreover,
they are able to overcome the data sparsity
problem of joint feature spaces, in spite of the
limited size of our training corpus
- …